(a) (b)

(a) The ROC curves of Jackknife CART and C5.0 models for the factor Xa

eavage data. Their AUC values were 0.856 and 0.916, respectively. (b) The

ogos generated for the factor Xa protease data using the ggseqlogo package.

panel is the sequence logo of the non-cleaved peptides and the lower panel is

ce logo of the cleaved peptides.

e random forest algorithm

ementioned decision tree algorithms employ the orthogonal

approach, i.e., each partitioning rule employs one variable, and

the value of the variable against a constant, such as x < T. Using

of partitioning strategy, a complex data space may need many

ng rules to generate a decision tree model. Figure 3.47(a) shows

ace in which there are two classes of data points. If the orthogonal

approach is used by employing three partitioning rules, Figure

hows a decision tree constructed for this data set. The first

ng rule and the third partitioning rule do not generate pure

s. The consequence is that only one subspace is pure for one class.

hown in Figure 3.47(b), where only one leaf node is coloured by

colour. For instance, the left subspace generated by the first

rule ݔ൏ݔ is composed of 17 data points of one class and one

t of the other class. It is no doubt that if all subspaces are required

e, the derived decision tree will be very complex.